Draft versus finished sequence data for DNA and protein diagnostic signature development
نویسندگان
چکیده
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(-3)-10(-5) (approximately 8x coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of approximately 1% (3x to 6x coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.
منابع مشابه
Tough Mining
Caenorhabditis elegans, a 1-mm soil-dwelling roundworm with 959 cells, may be the best-understood multicellular organism on the planet. As the most " pared-down'' animal that shares essential features of human biology—from embryogenesis to aging—C. elegans is a favorite subject for studying how genes control these processes. The way these genes work in worms helps scientists understand how dise...
متن کاملAttitudes towards English Language Norms in the Expanding Circle: Development and Validation of a new Model and Questionnaire
This paper describes the development and validation of a new model and questionnaire to measure Iranian English as a foreign language learners’ attitudes towards the use of native versus non-native English language norms. Based on a comprehensive review of the related literature and interviews with domain experts, five factors were identified. A draft version of a questionnaire based on those f...
متن کاملDecoding the human genome sequence.
The year 2000 is marked by the production of the sequence of the human genome. A 'working draft' of high quality sequence covering 90% of the genome has been determined and a quarter is in finished form, including the first two completed chromosomes. All sequence data from the project is made freely available to the community via the Internet, for further analysis and exploitation. The challeng...
متن کاملI-3: Human Y Chromosome Proteome Project 2012 Update
The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...
متن کاملDetection and Characterization of Weissellicin 110, a Bacteriocin Produced by Weissella cibaria
Background: Weissellicin 110 is the only bacteriocin reported in Weissella cibaria up to now. This bacteriocin represents several unique features. This is the first report on the gene sequence that encodes for the bacteriocin. Objectives: Providing a rapid detection method to isolate the weissellicin 110 encoding gene and determination of the bacteriocin distribution were the objectives. Materi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nucleic Acids Research
دوره 33 شماره
صفحات -
تاریخ انتشار 2005